Explicit Correlation Amplifiers for Finding Outlier Correlations in Deterministic Subquadratic Time
نویسندگان
چکیده
We derandomize G. Valiant’s [J. ACM 62 (2015) Art. 13] subquadratic-time algorithm for finding outlier correlations in binary data. Our derandomized algorithm gives deterministic subquadratic scaling essentially for the same parameter range as Valiant’s randomized algorithm, but the precise constants we save over quadratic scaling are more modest. Our main technical tool for derandomization is an explicit family of correlation amplifiers built via a family of zigzagproduct expanders in Reingold, Vadhan, and Wigderson [Ann. of Math. 155 (2002) 157–187]. We say that a function f : {−1, 1} → {−1, 1} is a correlation amplifier with threshold 0 ≤ τ ≤ 1, error γ ≥ 1, and strength p an even positive integer if for all pairs of vectors x, y ∈ {−1, 1} it holds that (i) |〈x, y〉| < τd implies |〈f(x), f(y)〉| ≤ (τγ)D; and (ii) |〈x, y〉| ≥ τd implies ( 〈x,y〉 γd )p D ≤ 〈f(x), f(y)〉 ≤ ( γ〈x,y〉 d )p D. 1998 ACM Subject Classification F.2.1 Numerical Algorithms and Problems
منابع مشابه
A Faster Subquadratic Algorithm for Finding Outlier Correlations
We study the problem of detecting outlier pairs of strongly correlated variables among a collection of n variables with otherwise weak pairwise correlations. After normalization, this task amounts to the geometric task where we are given as input a set of n vectors with unit Euclidean norm and dimension d, and we are asked to find all the outlier pairs of vectors whose inner product is at least...
متن کاملA statistical test for outlier identification in data envelopment analysis
In the use of peer group data to assess individual, typical or best practice performance, the effective detection of outliers is critical for achieving useful results. In these ‘‘deterministic’’ frontier models, statistical theory is now mostly available. This paper deals with the statistical pared sample method and its capability of detecting outliers in data envelopment analysis. In the prese...
متن کاملIdentification of outliers types in multivariate time series using genetic algorithm
Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...
متن کاملOn the Difference Between Closest, Furthest, and Orthogonal Pairs: Nearly-Linear vs Barely-Subquadratic Complexity
Point location problems for n points in d-dimensional Euclidean space (and `p spaces more generally) have typically had two kinds of running-time solutions: (Nearly-Linear) less than d · n log n time, or (Barely-Subquadratic) f(d) ·n2−1/Θ(d) time, for various f . For small d and large n, “nearly-linear” running times are generally feasible, while the “barely-subquadratic” times are generally in...
متن کامل